New Worker-Centric Scheduling Strategies for Data-Intensive Grid Applications
نویسندگان
چکیده
Distributed computations, dealing with large amounts of data, are scheduled in Grid clusters today using either a task-centric mechanism, or a worker-centric mechanism. Because of the large data sets, the execution time is bounded by the cost of data transfer. In this paper, we introduce new worker-centric scheduling strategies that are novel in that they aim to implicitly exploit the locality of interest in order to reduce the cost of data transfer. Many Grid applications are characterized by such a locality of interest, i.e., a file is often accessed by multiple tasks and, more importantly, a set of files that are accessed by one task are also likely to be accessed together by other tasks. Our new deterministic, as well as probabilistic, scheduling algorithms implicitly exploit this feature to improve running time. Our experiments are done with traces of a real Grid application (Coadd), and show that our algorithms are able to achieve utilization of over 90%, while reducing makespan significantly compared to task-centric approaches.
منابع مشابه
A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability
Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...
متن کاملAn ACO Algorithm for Scheduling Data Intensive Application with Various QOS Requirements
Grid computing is rapidly growing in the distributed heterogeneous environment for utilizing and sharing large scale resources to solve complex scientific problems. The main goal of grid computing is to aggregate the power of widely distributed resources and provide non trivial QOS services to the users. To achieve this goal, an efficient grid scheduling algorithm is required. The problem of sc...
متن کاملData Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملApplication of Discrete Particle Swarm Optimization for Grid Task Scheduling Problem
Many applications involve the concepts of scheduling, such as communications, packet routing, production planning [Zhai et al., 2006], classroom arrangement [Mathaisel & Comm, 1991], aircrew scheduling [Chang, 2002], nurse scheduling [Ohki et al., 2006], food industrial [Simeonov & Simeonovova, 2002], control system [Fleming & Fonseca, 1993], resource-constrained scheduling problem [Chen, 2007]...
متن کاملChameleon: A Resource Scheduler in A Data Grid Environment
Grid computing is moving into two ways. The Computational Grid focuses on reducing execution time of applications that require a great number of computer processing cycles. The Data Grid provides the way to solve large scale data management problems. Data intensive applications such as High Energy Physics and Bioinformatics require both Computational and Data Grid features. Job scheduling in Gr...
متن کامل